5 research outputs found
Functional Graphical Models: Structure Enables Offline Data-Driven Optimization
While machine learning models are typically trained to solve prediction
problems, we might often want to use them for optimization problems. For
example, given a dataset of proteins and their corresponding fluorescence
levels, we might want to optimize for a new protein with the highest possible
fluorescence. This kind of data-driven optimization (DDO) presents a range of
challenges beyond those in standard prediction problems, since we need models
that successfully predict the performance of new designs that are better than
the best designs seen in the training set. It is not clear theoretically when
existing approaches can even perform better than the naive approach that simply
selects the best design in the dataset. In this paper, we study how structure
can enable sample-efficient data-driven optimization. To formalize the notion
of structure, we introduce functional graphical models (FGMs) and show
theoretically how they can provide for principled data-driven optimization by
decomposing the original high-dimensional optimization problem into smaller
sub-problems. This allows us to derive much more practical regret bounds for
DDO, and the result implies that DDO with FGMs can achieve nearly optimal
designs in situations where naive approaches fail due to insufficient coverage
of the offline data. We further present a data-driven optimization algorithm
that inferes the FGM structure itself, either over the original input variables
or a latent variable representation of the inputs
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies
Effective offline RL methods require properly handling out-of-distribution
actions. Implicit Q-learning (IQL) addresses this by training a Q-function
using only dataset actions through a modified Bellman backup. However, it is
unclear which policy actually attains the values represented by this implicitly
trained Q-function. In this paper, we reinterpret IQL as an actor-critic method
by generalizing the critic objective and connecting it to a
behavior-regularized implicit actor. This generalization shows how the induced
actor balances reward maximization and divergence from the behavior policy,
with the specific loss choice determining the nature of this tradeoff. Notably,
this actor can exhibit complex and multimodal characteristics, suggesting
issues with the conditional Gaussian actor fit with advantage weighted
regression (AWR) used in prior methods. Instead, we propose using samples from
a diffusion parameterized behavior policy and weights computed from the critic
to then importance sampled our intended policy. We introduce Implicit Diffusion
Q-learning (IDQL), combining our general IQL critic with the policy extraction
method. IDQL maintains the ease of implementation of IQL while outperforming
prior offline RL methods and demonstrating robustness to hyperparameters. Code
is available at https://github.com/philippe-eecs/IDQL.Comment: 11 Pages, 6 Figures, 3 Table
Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL
The necessity for cooperation among intelligent machines has popularised
cooperative multi-agent reinforcement learning (MARL) in the artificial
intelligence (AI) research community. However, many research endeavors have
been focused on developing practical MARL algorithms whose effectiveness has
been studied only empirically, thereby lacking theoretical guarantees. As
recent studies have revealed, MARL methods often achieve performance that is
unstable in terms of reward monotonicity or suboptimal at convergence. To
resolve these issues, in this paper, we introduce a novel framework named
Heterogeneous-Agent Mirror Learning (HAML) that provides a general template for
MARL algorithmic designs. We prove that algorithms derived from the HAML
template satisfy the desired properties of the monotonic improvement of the
joint reward and the convergence to Nash equilibrium. We verify the
practicality of HAML by proving that the current state-of-the-art cooperative
MARL algorithms, HATRPO and HAPPO, are in fact HAML instances. Next, as a
natural outcome of our theory, we propose HAML extensions of two well-known RL
algorithms, HAA2C (for A2C) and HADDPG (for DDPG), and demonstrate their
effectiveness against strong baselines on StarCraftII and Multi-Agent MuJoCo
tasks
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Large sequence model (SM) such as GPT series and BERT has displayed
outstanding performance and generalization capabilities on vision, language,
and recently reinforcement learning tasks. A natural follow-up question is how
to abstract multi-agent decision making into an SM problem and benefit from the
prosperous development of SMs. In this paper, we introduce a novel architecture
named Multi-Agent Transformer (MAT) that effectively casts cooperative
multi-agent reinforcement learning (MARL) into SM problems wherein the task is
to map agents' observation sequence to agents' optimal action sequence. Our
goal is to build the bridge between MARL and SMs so that the modeling power of
modern sequence models can be unleashed for MARL. Central to our MAT is an
encoder-decoder architecture which leverages the multi-agent advantage
decomposition theorem to transform the joint policy search problem into a
sequential decision making process; this renders only linear time complexity
for multi-agent problems and, most importantly, endows MAT with monotonic
performance improvement guarantee. Unlike prior arts such as Decision
Transformer fit only pre-collected offline data, MAT is trained by online
trials and errors from the environment in an on-policy fashion. To validate
MAT, we conduct extensive experiments on StarCraftII, Multi-Agent MuJoCo,
Dexterous Hands Manipulation, and Google Research Football benchmarks. Results
demonstrate that MAT achieves superior performance and data efficiency compared
to strong baselines including MAPPO and HAPPO. Furthermore, we demonstrate that
MAT is an excellent few-short learner on unseen tasks regardless of changes in
the number of agents. See our project page at
https://sites.google.com/view/multi-agent-transformer